METHOD AND APPARATUS FOR ENCODING/DECODING AUDIO DATA 

WITH SCALABILITY 

This application claims the priority of Korean Patent Application No. 
5 2002-80320, filed December 16, 2002, in the Korean Intellectual Property Office, 
the disclosure of which is incorporated herein in its entirety by reference. 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

10 The present invention relates to coding and decoding audio data, and 

more particularly, to a method and apparatus for coding audio data so that a 
coded audio bitstream has a scalable bitrate, and a method and apparatus for 
decoding the audio data. 

2. Description of the Related Art 

15 Due to recent developments in. digital signal processing technology, 

audio signals are generally stored in most cases as digital data and reproduced. 
Digital audio storage/restoration apparatuses transform audio signals into pulse 
code modulation (PCM) audio data, i.e., digital signals, through sampling and 
quantization. By doing so, the digital audio storage/reproducing apparatus 

20 stores the PCM audio data in an information storage medium such as a 
compact disc (CD) and a digital versatile disc (DVD), and reproduces the stored 
signal in response to a user's command such that the user can listen to the 
audio data. The digital storage/restoration method greatly improves audio 
quality compared to analog methods using a long-playing (LP) record or 

25 magnetic tape, and dramatically reduces deterioration caused by a long storage 
period. However, the digital method has a problem in storage and 
transmission due to the large amount of digital data. 

To solve this problem, a variety of compression methods are used to 
compress digital audio signals. 

30 In Moving Pictures Expert Group (MPEG)/audio standardized by 

International Standard Organization (ISO), or AC-2/AC-3 developed by Dolby, 
the amount of data is reduced using psychoacoustic models. As a result, the 
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amount of data can be efficiently reduced regardless of the characteristics of a 
signal. That is, the MPEG/audio standard or AC-2/AC-3 method can provide 
almost the same audio quality as that of a CD with a bitrate of only 64-384 
Kbps, which is 1/6 to 1/8 of that of the previous digital encoding method. 
5 In these methods, however, an optimal state suitable for a fixed bitrate 

is searched for and then quantization and encoding are performed. 
Accordingly, if the transmission bandwidth is lowered due to poor network 
conditions in transmitting bitstreams through the network, cut-offs may occur 
and appropriate services cannot be rendered to a user any more. In addition, 

10 when the bitstream is desired to be transformed into bitstreams of a smaller size 
more suitable for a mobile apparatus having a limited storage capacity, a re- 
encoding process should be performed in order to reduce the size of a 
bitstream, and the amount of computation required increases. 

To solve this problem, the applicant of the present invention filed Korea 

15 Patent Application No. 97-61298 on Nov. 19, 1997 entitled "Bitrate Scalable 
Audio Encoding/Decoding Method and Apparatus Using Bit-Sliced Arithmetic 
Coding (BSAC)", for which a patent was granted on Apr. 17, 2000 with Korea 
Patent No. 261253. According to the BSAC technique, a bitstream coded with 
a high bitrate can be made into a bitstream with a low bitrate, and restoration is 

20 possible with only part of the bitstream. Accordingly, when the network is 
overloaded, or the performance of a decoder is poor, or a user requests a low 
bitrate, services with some degree of audio quality can be provided to the user 
by using only part of the bitstream, though the quality will inevitably decrease in 
proportion to the decrease in the bitrate. 

25 However, since the BSAC technique adopts arithmetic coding, 

complexity is high, and when the BSAC technique is implemented in an actual 
apparatus, the cost increases. m In addition, since the BSAC technique uses a 
modified discrete cosine transform (MDCT) for transformation of an audio signal, 
audio quality in a lower layer may severely deteriorate. 

30 

SUMMARY OF THE INVENTION 
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The present invention provides a method and apparatus for 
encoding/decoding audio data with scalability, by which fine grain scalability 
(FGS) is provided with lower complexity. 

According to an aspect of the present invention, there is provided a 
5 method for coding audio data with scalability slicing audio data so that sliced 
audio data corresponds to a plurality of layers, obtaining scale band information 
and coding band information corresponding to each of the plurality of layers, 
coding additional information containing scale factor information and coding 
model information based on scale band information and coding band 

10 information corresponding to a first layer, obtaining quantized samples by 
quantizing audio data corresponding to the first layer with reference to the scale 
factor information, coding the obtained plurality of quantized samples in units of 
symbols in order from a symbol formed with most significant bits (MSB) down to 
a symbol formed with least significant bits (LSB) by referring to the coding 

15 model information, and repeatedly performing the steps with increasing the 
ordinal number of the layer one by one every time, until coding for the plurality 
of layers is finished. 

Before the coding of additional information, there may be further 
included obtaining a bit range allowed in each of the plurality of layers, wherein 

20 in the coding of the obtained plurality of quantized samples, the number of 
coded bits is counted, and if the number of counted bits exceeds a bit range 
corresponding to the bits, coding is stopped, and if the number of counted bits is 
less than the bit range corresponding to the bits even after quantized samples 
are all coded, bits that remain not coded after coding in a lower layer is finished 

25 are coded to the extent that the bit range permits. 

The slicing of audio data comprises performing a wavelet transform of 
audio data, and slicing the wavelet-transformed data by referring to a cut-off 
frequency so that the sliced data corresponds to the plurality of layers. 

The coding of the plurality of quantized samples comprises mapping a 

30 plurality of quantized samples on a bit plane, and coding the samples in units of 
symbols within a bit range allowed in a layer corresponding to the samples in 
order from a symbol formed with MSB bits down to a symbol formed with LSB 
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bits. In the mapping of the plurality of quantized samples, K quantized 
samples are mapped on a bit plane, and in the coding of the samples, a scalar 
value corresponding to the symbol formed with K-bit binary data is obtained, 
and Huffman coding is performed by referring to the K-bit binary data, the 
5 obtained scalar value, and a scalar value corresponding to a symbol higher than 
a current symbol on the bit plane, where K is an integer. 

According to another aspect of the present invention, there is provided 
a coding method comprising differential-decoding additional information 
containing scale factor information and coding model information corresponding 

10 to a first layer, Huffman-decoding audio data in units of symbols in order from a 
symbol formed with MSB bits down to a symbol formed with LSB bits and 
obtaining quantized samples by referring to the coding model information, 
inversely quantizing the obtained quantized samples by referring to the scale 
factor information, inversely MDCT transforming the inversely quantized 

15 samples, and repeatedly performing the steps with increasing the ordinal 
number of the layer one by one every time, until decoding for a predetermined 
plurality of layers is finished. 

The Huffman-decoding of audio data comprises decodinjg audio data in 
units of symbols within a bit range allowed in a layer corresponding to the audio 

20 data, in order from a symbol formed with MSB bits down to a symbol formed 
with LSB bits, and obtaining quantized samples from a bit plane on which 
decoded symbols are arranged. 

In decoding audio data, a 4*K bit plane formed with decoded symbols is 
obtained, and in obtaining quantized samples, K quantized samples are 

25 obtained from the 4*K bit plane, where K is an integer. 

According to another aspect of the present invention, there is provided 
an apparatus for decoding audio data that is coded in a layered structure, with 
scalability, comprising an unpacking unit which decodes additional information 
containing scale factor information and coding model information corresponding 

30 to a first layer, and by referring to the coding model information, decodes audio 
data in units of symbols in order from a symbol formed with MSB bits down to a 
symbol formed, with LSB bits and obtaining quantized samples, an inverse 
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quantization unit which inversely quantizes the obtained quantized samples by 
referring to the scale factor information, and an inverse transformation unit 
which inverse-transforms the inversely quantized samples. 

The unpacking unit decodes audio data in units of symbols within a bit 
5 range allowed in a layer corresponding to the audio data, in order from a symbol 
formed with MSB bits down to a symbol formed with LSB bits, and obtains 
quantized samples from a bit plane on which decoded symbols are arranged. 

The unpacking unit obtains a 4*K bit plane formed with decoded 
symbols and then, obtains K quantized samples from the 4*K bit plane, where K 
10 is an integer. 

According to another aspect of the present invention, there is provided 
an apparatus for coding audio data with scalability comprising a transformation 
unit which MDCT transforms the audio data, a quantization unit which quantizes 
the MDCT-transformed audio data corresponding to each layer, by referring to 

15 the scale factor information, and outputs quantized samples, and a packing unit 
which differential-codes additional information containing scale factor 
information and coding model information corresponding to each layer, and 
Huffman-codes the plurality of quantized samples from the quantization unit, in 
units of symbols in order from a symbol formed with most significant bits (MSB) 

20 down to a symbol formed with least significant bits (LSB) by referring to the 
coding model information. 

The packing unit obtains scale band information and coding band 
information corresponding to each of the plurality of layers, and codes additional 
information containing scale factor information and coding model information 

25 based on scale band information and coding band information corresponding to 
each layer. 

The packing unit counts the number of coded bits and if the number of 
counted bits exceeds a bit range corresponding to the bits, stops the coding, 
and if the number of counted bits is less than the bit range corresponding to the 
30 bits even after quantized samples are all coded, codes bits that remain not 
coded after coding in a lower layer is finished, to the extent that the bit range 
permits. 
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The packing unit slices the MDCT-transformed data by referring to a 
cut-off frequency so that the sliced data corresponds to the plurality of layers. 

The packing unit maps a plurality of quantized samples on a bit plane, 
and codes the samples in units of symbols within a bit range allowed in a layer 
5 corresponding to the samples, in order from a symbol formed with MSB bits 
down to a symbol formed with LSB bits. 

The packing unit maps K quantized samples on a bit plane, obtains a 
scalar value corresponding to the symbol formed with K-bit binary data, and 
then performs Huffman-coding by referring to the K-bit binary data, the obtained 
10 scalar value, and a scalar value corresponding to a symbol higher than a 
current symbol on the bit plane, where K is an integer. 

BRIEF DESCRIPTION OF THE DRAWINGS 
The above objects and advantages of the present invention will become 
15 more apparent by describing in detail preferred embodiments thereof with 
reference to the attached drawings in which: 

FIG. 1 is a block diagram of an encoding apparatus according to a 
preferred embodiment of the present invention; 

FIG. 2 is a block diagram of a decoding apparatus according to a 
20 preferred embodiment of the present invention; 

FIG. 3 is a diagram of the structure of a frame which forms a bitstream 
coded in a layered structure so that so that the bitrate can be controlled; 

FIG. 4 is a detailed diagram of the structure of additional information; 
FIG. 5 is a reference diagram to explain schematically an encoding 
25 method according to the present invention; 

FIG. 6 is a reference diagram to explain more specifically an encoding 
method according to the present invention; 

FIG. 7 is a flowchart for explaining an encoding method according to a 
preferred embodiment of the present invention; 
30 FIG. 8 is a flowchart for explaining a decoding method according to a 

preferred embodiment of the present invention; and 

FIG. 9 is a flowchart for explaining a decoding method according to 
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another preferred embodiment of the present invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Referring to FIG. 1, an encoding apparatus codes audio data in a 
5 layered structure so that the bitrate of the coded bitstream can be controlled 
according to the present invention, and comprises a transformation unit 11, a 
psychoacoustic unit 12, a quantization unit 13, and a bit packing unit 14. 

The transformation unit 11 receives pulse code modulation (PCM) audio 
data which is a time domain audio signal, and transforms the signal into a 

10 frequency domain signal, referring to information on a psychoacoustic model 
provided by the psychoacoustic unit 12. While the differences between the 
characteristics of audio signals that a human being can perceive is not so big in 
the time domain, there is a big difference between the characteristics of a signal 
that can be perceived by a human and a signal that cannot be perceived by a 

15 human in the frequency domain audio signals obtained through transformation. 
Accordingly, by differentiating the numbers of bits allocated to respective 
frequency bands, the efficiency of compression can be increased. In the 
present embodiment, the transformation unit 11 performs a modified discrete 
cosine transform (MDCT). 

20 The psychoacoustic unit 12 provides information on a psychoacoustic 

model such as attack sensing information, to the transformation unit 11 and 
groups the audio signals transformed by the transformation unit 11 into signals 
of appropriate subbands. Also, the psychoacoustic unit 12 calculates a 
masking threshold in each subband by using a masking effect caused by 

25 interactions between respective signals, and provides the threshold values to 
the quantization unit 13. The masking threshold is the maximum size of a 
signal that cannot be perceived by a human due to the interaction between 
audio signals. In the present embodiment, the psychoacoustic unit 12 
calculates masking thresholds of stereo components by using binaural masking 

30 level depression (BMLD). 

The quantization unit 13 scalar-quantizes an audio signal in each band, 
based on scale factor information corresponding to the audio signal, so that the 
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size of quantization noise in the band is less than the masking threshold 
provided by the psychoacoustic unit 12 so that a human cannot perceive the 
noise. Then, the quantization unit 13 outputs the quantized samples. That is, 
by using the masking threshold calculated in the psychoacoustic unit 12 and a 
5 noise-to-mask ratio (NMR) that is the ratio of a noise generated in each band, 
the quantization unit 13 performs quantization so that NMR values are 0 dB or 
less in the entire bands. The NMR values of 0 dB or less mean that a human 
cannot perceive the quantization noise. 

The bit packing unit 14 codes quantized samples belonging to each 

io layer and additional information and packs the coded signal in a layered 
structure. The additional information includes scale band information, coding 
band information, their scale factor information, and coding model information in 
each layer. The scale band information and coding band information may be 
packed as header information and then transmitted to a decoding apparatus. 

15 Otherwise, the scale band information and coding band information may be 
coded and packed as additional information for each layer and then transmitted 
to a decoding apparatus. The scale band information and coding band 
information may not be transmitted to a decoding apparatus because they are 
pre-stored in the decoding apparatus in some cases. 

20 More specifically, while coding additional information containing scale 

factor information and coding model information corresponding to a first layer, 
the bit packing unit 14 performs coding of the samples and information in units 
of symbols in order from a symbol formed with most significant bits (MSBs) 
down to a symbol formed with least significant bits (LSBs), referring to coding 

25 model information corresponding to the first layer. Then, in the second layer, 
the same process is repeatedly performed. That is, until the coding of a 
plurality of predetermined layers is finished, coding is performed with increasing 
the number of layers. In the present embodiment, the bit packing unit 14 
differential-codes the scale factor information and the coding model information, 

30 and Huffman-codes the quantized samples. The layered structure of 
bitstreams coded according to the present invention will be explained later. 

Scale band information refers to information for performing quantization 



more appropriately according to frequency characteristics of an audio signal. 
When a frequency area is divided into a plurality of bands and an appropriate 
scale factor is allocated to each band, the scale band information indicates a 
scale band corresponding to each layer. Thus, each layer belongs to at least 
5 one scale band. Each scale band has one allocated scale factor. Also, 
coding band information refers information for performing coding more 
appropriately according to frequency characteristics of an audio signal. When 
a frequency area is divided into a plurality of bands and an appropriate coding 
model is assigned to each band, the coding band information indicates a coding 

10 band corresponding to each layer. The scale bands and coding bands are 
empirically divided, and scale factors and coding models corresponding thereto, 
respectively, are determined based on the same. 

FIG. 2 is a block diagram of a decoding apparatus according to a 
preferred embodiment of the present invention. 

is Referring to FIG. 2, the decoding apparatus decodes bitstreams to a 

target layer determined by the condition of a network, the performance of the 
decoding apparatus, and a user's selection such that the bitrate of a bitstream 
can be controlled. The decoding apparatus comprises an unpacking unit 21, 
an inverse quantization unit 22, and an inverse transformation unit 23. . 

20 The unpacking unit 21 unpacks bitstreams to a target layer, and 

decodes bitstreams in each layer. That is, additional information containing 
scale factor information and coding model information corresponding to each 
layer is decoded, and then based on the obtained coding model information, 
coded quantized samples belonging to the layer are decoded and the quantized 

25 samples are restored. In the present embodiment, the unpacking unit 21 
differential-decodes scale factor information and coding model information and 
Huffman-decodes the coded quantized samples. 

Meanwhile, the scale band information and coding band information are 
obtained from the header information of a bitstream or by decoding additional 

30 information in each layer. Alternatively, the decoding apparatus may store the 
scale band information and coding band information in advance. The inverse 
quantization unit 22 inversely quantizes and restores quantized samples in each 
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layer according to scale factor information corresponding to the samples. The 
inverse transformation unit 23 frequency/time-maps the restored samples to 
transform the samples into PCM audio data of a time domain, and outputs the 
same. In the present embodiment, the inverse transformation unit 23 performs 
5 MDCT-based inverse transformation. 

FIG. 3 is a diagram of the structure of a frame which forms a bitstream 
coded in a layered structure so that the bitrate can be controlled. 

Referring to FIG. 3, the frame of a bitstream according to the present 
invention is coded by mapping quantized samples and additional information to 

10 a layered structure in order to obtain fine grain scalability (FGS). In other 
words, a lower layer bitstream is included in an enhancement layer bitstream in 
the layered structure. Additional information needed in each layer is allocated 
to each layer and then coded. 

A header region for storing header information is placed in the front of a 

15 bitstream, then information on layer 0 is packed after the header region, and 
then information belonging to layers 1 through N that are enhancement layers 
are packed in order. A layer from the header region to layer 0 information is 
referred to as base layer, a layer from the header region to layer 1 information is 
referred to as layer 1 , and a layer from the header region to layer 2 information 

20 is referred to as layer 2. Likewise, an uppermost layer indicates a layer from 
the header region to layer N information, that is, from the base layer to layer N 
that is the enhancement layer. Additional information and coded audio data 
are stored as each layer information. For example, additional information 2 
and coded quantized samples are stored as layer 2 information. Here, N is an 

25 integer greater than or equal to 1 . 

FIG. 4 is a detailed diagram of the structure of additional information. 
Referring to FIG. 4, additional information and coded quantized samples 
are stored as arbitrary additional information, and in the present embodiment, 
additional information includes Huffman coding model information, quantization 

30 factor information, additional information on channels, and other additional 
information. The Huffman coding model information is index information on a 
Huffman coding model which should be used in coding or decoding quantized 



samples belonging to a layer corresponding to the information. Quantization 
factor information indicates a quantization step size for quantizing or inversely 
quantizing audio data belonging to a layer corresponding to the information. 
Additional information on channels is information on a channel such as M/S 
5 stereo. Other additional information is flag information on whether M/S stereo 
is employed or not. 

In the present embodiment, the bit packing unit 14 performs differential 
coding of Huffman coding model information and quantization factor information. 
In the differential coding, the differential value of a value of an immediately 

10 previous band is coded. Additional information on channels is Huffman-coded. 

FIG. 5 is a reference diagram to explain more specifically an encoding 
method according to the present invention. 

Referring to FIG. 5, quantized samples to be coded have a 3-layered 
structure. An oblique lined rectangle denotes a spectral line composed of 

15 quantized samples, solid lines indicate scale bands and dotted lines indicate 
coding bands. Scale bands CD, (2), ®, ® and (5) and coding bands CD, (2), 
(3), ® and (5) belong to layer 0. Scale bands (5) and @ and coding bands 
@, ©, ®, (D and ® belong to layer 1. Scale bands ® and © and coding 
bands @, ©, @, @ and © belong to layer 2. Meanwhile, layer 0 is defined 

20 such that coding is performed up to a frequency band @, layer 1 is defined 
such that coding is performed up to a frequency band © and layer 2 is defined 
such that coding is performed up to a frequency band ©. 

First, quantized samples belonging to layer 0 are coded within a bit 
range of 100 using the corresponding coding model. Also, as additional 

25 information of layer 0, the scale bands ®, (2), CD, ® and ® and coding 
bands ®, (2), ©, ® and ® belonging to layer 0 are coded. While coding 
the quantized samples in units of symbols, the number of bits are counted. If 
the number of bits counted exceeds the allowed bit range, coding of layer 0 is 
stopped and layer 1 is arithmetic-coded. Among the quantized samples 

30 belonging to layer 0, uncoded quantized samples are coded next when there is 
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still room in the number of allowed bits in layers 0 and 1 . 

Next, quantized samples belonging to layer 1 are coded using a coding 
model of one among coding bands belonging to layer 1, that is, the coding 
bands @, ©, ®, ® and ©, to which quantized samples to be coded belong. 
5 Also, as additional information of layer 1, the scale bands (5) and ® and 
coding bands @, CD, ®, (D and © belonging to layer 1 are coded. If there 
is still room in the allowed bit range, that is, 100 bits, even after coding all 
samples corresponding to layer 1, uncoded bits remaining in layer 0 are coded 
until the allowed bits, that is, 100 bits, are counted. If the number of bits 

10 counted for coding exceeds the allowed bit range, coding of layer 1 is stopped 
and coding of layer 2 is started. 

Finally, quantized samples belonging to layer 2 are coded using a 
coding model of one among coding bands belonging to layer 2, that is, the 
coding bands ©, ©, ©, © and ©, to which quantized samples to be coded 

15 belong. Also, as additional information of layer 2, the scale bands ® and (7) 
and coding bands @, ©, ©, @ and © belong to layer 2 are coded. If 
there is still room in the allowed bit range, that is, 100 bits, even after coding all 
samples corresponding to layer 2, uncoded bits remaining in layer 0 are coded 
until the allowed bits, that is, 100 bits, are counted. 

20 If all the quantized samples are coded without consideration of an 

allowed bit range for layer 0, that is, if all the quantized samples are coded even 
after the number of coded bits exceeds the allowed bit range, that is, 100, which 
means that some of bits in an allowed bit range for the next layer, that is, layer 1 , 
are used in coding the current layer, it is often the case that quantized samples 

25 belonging to layer 1 cannot be coded. Thus, in the case of scalable decoding, 
if decoding is performed on layers ranging up to layer 1 , since all the quantized 
samples ranging up to a predetermined frequency band © corresponding to 
layer 1 are not coded, decoded quantized samples may fluctuate at frequencies 
lower than ©, resulting in a "Birdy" effect in which audio quality may deteriorate. 

30 In determining a plurality of layers (target layers), a bit range is 

assigned in consideration of the entire size of all audio data to be decoded. 



Thus, there is no possibility that coding is not performed due to a shortage in bit 
range in which bits to be coded are arranged. 

While decoding is performed in the opposite manner to the coding 
process, the number of bits is counted according to the allowed bit range. 
5 Thus, a point of decoding timing of a predetermined layer can be identified. 

FIG. 6 is a reference diagram to explain more specifically an encoding 
method according to the present invention. 

According to the present invention, the bit packing unit 14 performing 
coding on quantized samples corresponding to each layer by bit-plain coding 
10 and Huffman-coding. A plurality of quantized samples are mapped on a bit 
plane to then be expressed in binary form, and coded within an allowed bit 
range for each layer in order from a symbol formed with MSBs down to a 
symbol formed with LSBs. Important information on a bit plane is first coded, 
and relatively less important information is coded later. By doing so, a bitrate 
15 and a frequency band corresponding to each layer are fixed in the coding 
process so that distortion referred to as "a Birdy effect" can be reduced. 

FIG. 6 illustrates an example of coding in the case where the number of 
bits of symbols consisting of MSBs is 4 or less. When quantized samples 9, 2, 
4, and 0 are mapped on a bit plane, they are expressed in binary form, i.e., 
20 1001b, 0010b, 0100b, and 0000b, respectively. That is, in the present 
embodiment, the size of a coding block which is a coding unit on a bit plane is 
4*4. 

A symbol formed with the MSBs, msb, is "1001b", a symbol formed with 
the next MSBs, msb-1, is "0010b", a symbol formed with the next MSBs, msb-2, 
25 is "0100b", and a symbol formed the LSBs, msb-3, is "1000b". 

Huffman model information for Huffman coding, that is, a codebook 
index is as table 1 : 
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According to the table 1, two models exist even for an identical 
significance level (msb in the present embodiment). This is because two 
models are generated for quantized samples that show different distributions. 

A process for coding the example of FIG. 6 according to the table 1 will 
now be explained in more detail. 

In the case where the number of bits of a symbol is 4 or less, Huffman 
coding according to the present invention is shown as equation 1 : 
Huffman code value 

= HuffmanCodebook[codebook index][higher bit plane][symbol] (1) 

That is, Huffman coding uses 3 input variables, including a codebook 
index, a higher bit plane, and a symbol. The codebook index indicates a value 
obtained from the table 1, the higher bit plane indicates a symbol immediately 
above a symbol desired to be coded at present on a bit plane. The symbol 
indicates a symbol desired to be coded at present. 

Since the msb of the Huffman model is 4 in the example of FIG. 6, 13- 
16 or 17-20 are selected. If additional information to be coded is 8, 
the codebook index of a symbol formed with msb bits is 16, 
the codebook index of a symbol formed with msb-1 bits is 15, 
the codebook index of a symbol formed with msb-2 bits is 14, and 
the codebook index of a symbol formed with msb-3 bits is 13. 
Meanwhile, since the symbol formed with msb bits does not have data 
of a higher bit plane, if the value of the higher bit plane is 0, coding is performed 
with a code HuffmanCodebook[16][0b][1000b]. Since the higher bit plane of 
the symbol formed with msb-1 bits is 1000b, coding is performed with a code 
HuffmanCodebook[15][1000b][0010b]. Since the higher bit plane of the 
symbol formed with msb-2 bits is 0010b, coding is performed with a code 
HuffmanCodebook[14][0010b][0100b]. Since the higher bit plane of the 
symbol formed with msb-3 bits is 0100b, coding is performed with a code 
HuffmanCodebook[1 3][01 00b][1 000b]. 

The bit packing unit 14 counts the number of coded bits, compares the 
counted number with the number of bits allowed to be used in a layer, and if the 
counted number is greater than the allowed number, stops the coding. When 
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room is allowed in the next layer, the remaining bits that are not coded are 
coded and put in the next layer. If there is still room in the number of allowed 
bits in the layer after quantized samples allocated to a corresponding layer are 
all coded, that is, if there is room in the layer, quantized samples that remain not 
5 coded after coding in the lower layer is finished are coded. 

Meanwhile, if the number of bits of a symbol formed with msb's is 
greater than or equal to 5, a Huffman code value is determined using a location 
on the current bit plane. In other words, if the significance is greater than or 
equal to 5, there is little statistical difference in data on each bit plane, the data 

10 is Huffman-coded using the same Huffman model. That is, a Huffman mode 
exists per bit plane. 

If the significance is greater than or equal to 5, that is, the number of 
bits of a symbol is greater than or equal to 5, Huffman coding of the present 
invention satisfies the equation 2: 

15 Huffman code = 20+bpl 2 

wherein 'bpr indicates an index of a bit plane desired to be currently coded and 
is an integer greater than or equal to 1 . A constant 20 is a value added for 
indicating that an index starts from 21 because the last index of Huffman 
models corresponding to additional number 8, as listed in Table 2, is 20. 

20 Therefore, additional information for a coding band simply indicates significance. 
In Table 2, Huffman models are determined according to the index of a bit plane 
desired to be currently coded. 
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For quantization factor information and Huffman model information in 
additional information, DPCM is performed on a coding band corresponding to 
the information. When quantization factor information is coded, the initial value 
5 of DPCM is expressed by 8 bits in the header information of a frame. The 
initial value of DPCM for Huffman model information is set to 0. 

The differences between the coding method according to the present 
invention and the prior art BSAC technique are as follows. First, in the BSAC 
technique, coding is performed in units of bits, while coding is performed in units 
io of symbols in the present invention. Secondly, in the BSAC technique, 
arithmetic coding is used, while Huffman coding is used in the present invention. 
The arithmetic coding provides a higher compression gain, but increases 
complexity and cost. Accordingly, in the present invention, data is coded not in 
units of bits but in units of symbols through the Huffman coding such that 
15 complexity and cost decreases. 

In order to control a bitrate, that is, in order to apply scalability, a 
bitstream corresponding to one frame is cut off, considering the number of bits 
allowed to be used in each layer such that only with a small amount of data, 
decoding is possible. For example, if only a bitstream corresponding to 48 
20 kbps is desired to be decoded, only 1048 bits of a bitstream are used such that 
decoded audio data corresponding to 48 kbps can be obtained. 

The coding and decoding methods according to the present invention 
based on the structure described above will now be explained. 

The coding apparatus reads PCM audio data, stores the data in a 
25 memory (not shown), and obtains masking thresholds and additional 
information from the stored PCM audio data through psychoacoustic modeling. 
Since the PCM audio data is a time domain signal, the PCM audio data is 



wavelet-transformed into a frequency domain signal. Then, the coding 
apparatus obtains quantized samples by quantizing the wavelet-transformed 
signal according to quantization band information and quantization factor 
information. As described above, the quantized samples are coded and 
5 packed through bit-sliced coding, symbol unit-based coding, and Huffman 
coding. 

FIG. 7 is a flowchart for explaining an encoding method according to a 
preferred embodiment of the present invention. 

Referring to FIG. 7, the process in which the bit packing unit 14 of the 
10 coding apparatus codes and packs the quantized samples will now be 
explained. 

First, the bit packing unit 14 extracts information corresponding to each 
layer, based on a provided target bitrate and additional information. This 
process is performed in steps 701 through 703. In detail, a cut-off frequency 

15 which is a base for cut-off in each layer is obtained in step 701, quantization 
band information and coding band information corresponding to each layer are 
obtained in step 702, and a bit range within which bits that should be coded can 
be coded in each layer is allocated in step 703. 

Then, a layer index is determined as a base layer in step 704, and 

20 additional information, including quantization band information and coding band 
information, is coded in step 705. 

Next, quantized samples corresponding to the base layer are mapped 
on a bit plane, and coded in units of 4*4 blocks from the symbol formed with 
msb bits in step 706. The number of coded bits is counted and if the number 

25 exceeds the bit range of the current layer in step 707, then coding in the current 
layer is stopped and coding begins in the next layer. If the counted number of 
bits does not exceed the bit range in step 707, the procedure goes back to step 
705 for the next layer in step 709. Since the base layer has no lower layers, 
step 708 is not performed, but step 708 is performed for layers following after 

30 the base layer. Through the above steps, all layers ranging until the target 
layer is reached are coded. 

Step 706, that is, step for coding quantized samples, is as follows: 



1 . Quantized samples corresponding to a layer are grouped in units of 
N samples and mapped on a bit plane. 

2. Huffman coding is performed from a symbol. formed with msb bits of 
mapped binary data 

5 Substep 2 can be explained in more detail as follows: 

2.1 A scalar value (curVal) corresponding to a symbol desired to be 
coded is obtained. 

2.2 A Huffman code corresponding to a scalar value (upperVal) which 
corresponds to a symbol in a higher bit plane, that is, a symbol that is in a 

10 higher location in the bitstream than the location of a symbol desired to be 
currently coded is obtained. 

For quantization factor information and Huffman model information in 
additional information, DPCM is performed on a coding band corresponding to 
the information. When quantization factor information is coded, the initial value 

15 of DPCM is expressed by 8 bits in the header information of a frame. The 
initial value of DPCM for Huffman model information is set to 0. 

FIG. 8 is a flowchart for explaining a decoding method according to a 
preferred embodiment of the present invention. 

Referring to FIG. 8, the decoding apparatus receives a bitstream formed 

20 with audio data that is coded in a layered structure, and decodes header 
information in each frame. Then, additional information, including scale factor 
information and coding model information corresponding to a first layer, is 
decoded in step 801. Referring to the coding model information, quantized 
samples are obtained by decoding the bitstream in units of symbols in order 

25 from a symbol formed with msb bits down to a symbol formed with LSB bits in 
step 802. The obtained quantized samples are inversely quantized by 
referring to the scale factor information in step 803, and the inversely quantized 
samples are inverse-transformed in step 804. Steps 801 through 804 are 
repeatedly performed until decoding up to a predetermined target layer is 

30 finished with increasing the ordinal number added to each layer one by one 
every time. 

FIG. 9 is a flowchart for explaining a decoding method according to 
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another preferred embodiment of the present invention. 

Referring to FIG. 9, a bitstream formed with audio data that is coded in a 
layered structure is received, and a cut-off frequency corresponding to each 
layer is decoded from header information in each frame, in step 901. In step 
5 902, quantization band information and coding band information corresponding 
to each layer are identified from the header information by decoding. In step 
903, an allowed bit range to be used for each layer is identified. In step 904, a 
layer index is set as a base layer. Additional information on the base layer is 
decoded in step 905, and quantized samples are obtained by decoding the 

10 bitstream in units of symbols to the bit range allowed in each layer, in order from 
a symbol formed with msb bits down to a symbol formed with LSB bits in step 
906. In step 907, it is checked whether the current layer is the last one. 
Steps 905 and 906 are repeatedly performed on layers until a predetermined 
target layer is reached with increasing the number of a layer one by one. In 

15 steps 901 through 903, the decoding apparatus may have in advance the cut-off 
frequency, quantization band information, coding band information and bit range, 
rather than obtaining these pieces of information from header information stored 
in each frame of the received bitstream. In this case, the decoding apparatus 
obtains the information by reading the stored information. 

20 According to the present invention as described above, by coding the 

bits in units of symbols after performing the bit slicing, scalability with which a 
bitrate can be controlled in a top-down manner is provided such that the amount 
of computation by the coding apparatus is not much greater than that of an 
apparatus that does not provide scalability. That is, according to the present 

25 invention, there are provided a method and apparatus for coding/decoding 
audio data with scalability in which complexity is lower, while providing FGS, 
can be provided even in a lower layer. 

In addition, compared to the MPEG-4 Audio BSAC technique using the 
arithmetic coding, the coding/encoding apparatus of the present invention using 

30 the Huffman coding reduces the amount of computation in the processes for bit 
packing/unpacking, down to one eighth that of the BSAC technique. Even 
when bit packing according to the present invention is performed in order to 



provide the FGS, the overhead is small such that the coding gain is similar to 
that when scalability is not provided. 

Also, since the apparatus according to the present invention has a 
layered structure, the process for regenerate a bitstream so that a server side 
can control the bitrate is very simple, and accordingly, the complexity of an 
apparatus for transformation coding is low. 

When an audio stream is transmitted through a network, a transmission 
bitrate can be controlled according to the user's selection or the network 
conditions such that ceaseless services can be provided. 

Further, when the audio stream is stored in an information storage 
medium having a limited capacity, the size of a file can be controlled arbitrarily 
and stored. If a bitrate becomes low, the band is restricted. Accordingly, the 
complexity of a filter which accounts for most of the complexity of a 
coding/decoding apparatus decreases greatly, and the actual complexity of the 
coding/decoding apparatus decreases in inverse proportion to the bitrate. 
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